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Abstract — In this paper, we address the problem of global 
transmit power minimization in a self-configuring network where 
radio devices are subject to operate at a minimum signal to 
interference plus noise ratio (SINR) level. We model the network 
as a parallel Gaussian interference channel and we introduce 
a fully decentralized algorithm (based on trial and error) able 
to statistically achieve a configuration where the performance 
demands are met. Contrary to existing solutions, our algorithm 
requires only local information and can learn stable and efficient 
working points by using only one bit feedback. We model the net- 
work under two different game theoretical frameworks: normal 
form and satisfaction form. We show that the converging points 
correspond to equilibrium points, namely Nash and satisfaction 
equilibrium. Similarly, we provide sufficient conditions for the 
algorithm to converge in both formulations. Moreover, we provide 
analytical results to estimate the algorithm's performance, as a 
function of the network parameters. Finally, numerical results 
are provided to validate our theoretical conclusions. 
Keywords: Learning, power control, trial and error, Nash equi- 
librium, spectrum sharing. 

I. Introduction 

In this paper, we consider a network where several 
transmitter-receiver pairs communicate through a common 
bandwidth divided into several orthogonal sub-bands, thus, 
subject to mutual interference. All devices must guarantee 
certain quality of service (QoS), expressed in terms of signal 
to interference plus noise ratio (SINR). The behaviour of the 
devices is designed for achieving a stable network operating 
point (equilibrium) where the maximum number of communi- 
cating pairs satisfy their QoS with the minimum global power 
consumption. Network operating point must be achieved by 
the radio devices in a fully decentralized way by selecting 
their power allocation policy, i.e., selecting a sub-band and a 
power level for each transmission. In this scenario, all com- 
munications take place in absence of a centralized controller 
and neither cooperation nor exchange of information between 
different pairs are considered. For instance, this scenario may 
model the case of tactical radios and ad hoc networks, where, 
in order to set power and channel, current solutions require 
a certain level of cooperation with exchange of information 
between the devices or a manual setting. 
The closest works to ours are 1 1 1, |2| and f3). In |T1, variational 
inequality theory is used to design a centralized power control 
algorithm for this scenario, in [2 J the authors show that, if the 
assumption of S-modularity holds for the corresponding game. 



then best response dynamics ||4l converges to a generalized 
Nash equilibrium (GNE); in |[3l, the authors provide, under 
the assumption of low interference, a sufficient condition for 
the convergence of the iterative water-filling algorithm to a 
GNE. It is worth noting that the works in UJ, [2| and O 
assume a compact and convex set of actions, i.e, the possible 
power allocation (PA) vectors may take any value in the 
corresponding simplex. Conversely, in our work, we consider 
a finite action set by quantizing the possible available powers 
into a certain amount of levels. Basically, this is because in 
practice power levels must be expressed in a finite amount 
of bits. Moreover, several authors have pointed out that better 
global performance (e.g. spectral efficiency) is achieved when 
the set of PA vectors is substantially reduced Q, 161, ||7l, 
18 |. Indeed, this effect has been reported as a Braess kind 
paradox [9]. In this paper, our contribution is twofold: first, we 
present a fully decentralized learning algorithm able to keep 
the SINR level above a certain threshold a high proportion 
of the time, by means of only one bit feedback and relying 
only on local information [4J; second, we analytically study 
the convergence properties and the convergence point, which 
is shown to be an efficient Nash equilibrium (NE) in terms 
of global performance. The paper is organized as follows. In 
Sec. |II] we describe the wireless scenario and we formalize 
the problem; in Sec. 



Ill 



we model the system as a game in 
normal form and in satisfaction form; in Sec. |IV| w e present 
the trial and error algorithm as introduced in OlOII ; in Sec. 
[V] we present a formal analytical study of the convergence 
properties (i.e., expected number of iterations to reach the NE 
and the satisfaction equilibrium (SE)), as well as, the expected 
fraction of time the system is at the NE and at the SE; in Sec. 



VI we validate our analysis through numerical simulations; the 



paper is concluded in Sec. VII 



set fC 

A 



II. System model 

Let us consider the system described in Fig. [T] Here, a 
= {1,...,K} of transmitter-receiver pairs share a set 
C = Ife^^-*, ...,6*-'^^| of orthogonal sub-bands. Transmitter k 
is allowed to transmit over one sub-band at a time at a given 
power level. We denote by pj. e V, with V = {0, ■■■,Pmax}, 
IV] = Q, and bk £ C, the power level and the frequency 
sub-band chosen by transmitter k respectively. We denote by 



Tx 



Rx 



k = 1 



= 2 



k = K 



Fig. 1. System model 



p — {pi,p2, ■■■,pk) the network power allocation vector, by 
b = (foi, fe2, 6k) the spectrum occupation vector and by 
a = (ai, a2, ) a network configuration vector, where 
ifc = iPk,bk)- To communicate, pairs have to achieve a 
sufficient SINR level, i.e., SINRk > T where we denote by r 
the minimum SINR threshold allowing transmission. Receivers 
treat interference as Gaussian noise, thus: 



SINRk (a) = 



Pk9k,k 



(1) 



where g^''] represents the channel power gain between trans- 
mitter / and receiver k over sub-band b, is the power of 
the thermal noise assumed constant over the whole spectrum 
and 1{} represents the indicator function. In our scenario, we 
assume block-fading channels, i.e., channel realizations are 
time invariant for the whole transmission. Our objective is the 
satisfaction of the SINR constraints for the largest possible 
set of pairs by using the lowest global energy consumption. 
Formally, we want the network configuration vector a* to be 
a solution of the following optimization problem 



s.t. SINRu (a) > r 



Vfc G tC* 



(2) 



where we denote by AC* C AC the largest subset of links able to 
simultaneously achieve a sufficient SINR level. Generally, to 
achieve this goal a central controller knowing all the network's 
parameters is required. In the following sections, we propose 
a decentralized algorithm demanding no information on the 
network which will steer the system to a solution of (|2]i. 

III. Game Formulation 

In this section, we model the scenario presented in Sec [T] 
in two different formulations: a normal-form game and a 
satisfaction-form jl 11 game. 

A. Normal form formulation 

We model the network described above by the game in 
normal form 

g={]Q,A{uk}keK)- (3) 

Here, AC represents the set of players, A is the joint set of 
actions, that is, ^ = x ^2 x ■■• x -Ak where Ak ^ C xV 
and we introduce the utility function uj. : — > R defined by: 

/X 1 / Pmax - Pk 



1 + /3 V Pmax 



+ PT^ {SINRk (a)>r} I > (4) 



where /? is a design parameter discussed in Sec |V] This 
function has been designed to be monotonically decreasing 
with the power consumption, and monotonically increasing 
with the number of players who achieve the minimum SINR. 
In the following, we show that, with this utility function, 
the NE of the game Q can solve the problem stated in ([2}. 
Moreover, note that to evaluate ^ each transmitter only 
requires local information, since l{5/Ar_R^.(a)>r} can easily 
be fed back by the receiver with 1 bit. 

Definition 1: (Interdependent game). Q is said to be interdepen- 
dent if for every non-empty subset fC^ C K and every action profile 
a — {af^+ ,a_jQ+ ^^uch that ajQ+ is the action profile of all players 
in fC^, it holds that: 



(5) 

In the following, we assume that game Q is interdependent. 
This is a reasonable assumption, since, physically, this means 
that no link is isolated from the others. The solution concept 
used under this formulation is the Nash equilibrium, which we 
define as follows: 

Definition 2: (Nash equilibrium in pure strategies). An action 
profile a* £ A is a NE of game Q ifi fc G AC and Va^ G Ak 



'>J'k{a*k,alk) > Uk{ak,alk)- 



(6) 



To measure the efficiency of each NE, we introduce the social 
welfare function, defined by the sum of all individual utilities: 

w^(«) = Ef=i"fc(a)- 

B. Satisfaction form 

The satisfaction form is a game theoretical formulation 
modelling scenarios where players are exclusively interested 
in the satisfaction of their individual QoS constraints. Let us 
define the game as 



(7) 



where AC, A follow the previous definitions and the satisfaction 
correspondence /j, : A^k — > R is defined by 



= K G Ak ■■ SINRk{ak,a_k, ) > T) , 



(8) 



The solution concept used under this formulation is the satis- 
faction equilibrium (SE) defined as: 

Definition 3: (Satisfaction equilibrium). A satisfaction equilib- 
rium of game Q' is an action profile a' £ A .such that Vfc G AC, 



4 G fk (a'-fc) 



(9) 



Moreover, we measure the effort of player k due to the use 
of a particular action by using the effort function fTTl 
'^k ■ — ^ [0, 1]. We can, then, define an efficient satisfaction 
equilibrium (ESE) as: 

Definition 4: (Efficient satisfaction equilibrium). A satisfaction 
equilibrium a' is said to be efficient, ifik G AC 



Ofc G arg min <^k{ak) 

akefk(a'_k) 



(10) 



'Here, a 



refers to the action profile of all the players that are not in 



In brief, an ESE is an action profile where all players are 
satisfied and no player may decrease its individual effort 
by unilateral deviation. Since our optimization problem is to 
minimize the overall transmit power, we identify the effort by 
the function: ^kic^k) =Pk- 

IV. Algorithm Description 

In this section, we briefly describe the trial and error (TE) 
algorithm introduced in [lOJ, [12] . Later, we characterize the 
degrees of freedom of the system to fit in our scenario. In TE 
learning, each player k locally implements a state machine, at 
each iteration n, a state is defined by the triplet: 



Zk{n) = {mk(n),akin),Uf,{n)} , 



(11) 



where mk{n) e {C,C+,C—, D} represents a "mood", i.e., a 
characteristic that defines the machine reaction to its experi- 
ence of the environment, & A and Uk e [0, 1] represents 
a benchmark action and benchmark utility, respectively. There 
are four possible moods: content (C), watchful (C— ), hopeful 
(C+), discontent {D). In the following, we characterize the 
behaviour of each player in every possible mood. 

• Content 

If at stage n player k is content, it uses the benchmarked action 
ak{n) with probability (1 — e) and experiments a new action 
a^,(n) with probability e. If at stage n the player decided to 
experiment, at stage {n + 1) it evaluates the utility u'j,(n + 1) 
associated with aj,(n) as follows: if ui.{n + 1) < uk{n) then 
Zk{n + 1) = Zk(n), otherwise if u'j.{n + 1) > Uk{n), then, 
with probability eG(«'fc(ri+i)-«fc(n))^ ^ action and utility 
benchmark are set out, i.e., uk(n + 1) = Uf^{n + 1) and ak{n + 
1) — a'j.{n), respectively. Here, G(-) must be such that: 



< G(Au) < 



(12) 



we opt for a linear formulation: G(Au) = — 0.2Au + 0.2. 

. Hopeful-Watchful 

If player k achieves an increment or a decrement in its utility 
without having experimented at the previous stage, then the 
mood become hopeful or watchful, according to the following 
rule: (i) if Uf,{n + 1) > uk{n) then, mfc(n + 1) = G+, ak{n + 

1) = o.k{n) and Uk{n + 1) = Ufe(n); {ii) if mk{n + 1) = C— , 
then dk{n + 1) = afe(n) and Uk{n + 1) = Uk{n). If player k 
observes an improvement also at the next stage (i.e., Uf.{n + 

2) > Mfe(n+ 1)), then the mood switches to content and the 
benchmark utility is updated with the new one: mfe(n + 2) = C 
and uk{n+2) = Uf.{n+l). On the contrary, if a loss is observed 
also at the next stage (i.e., u'j,(n + 2) < Uk{n + 1)), then the 
mood switches to discontent mk {n + 2) — D. 

• Discontent 

If player k is discontent, it experiments a new action (a'f^{n)) at 
each step n. We refer to this behaviour as noisy search. When 
the corresponding utility ^{.(ti+I) is observed, with probability 
p = e^("fc("+i) the mood turns to content mj.(n + 1) = C, a 
new action and utility benchmark are set up, Ukin + 1) = 
Uf.{n + 1) and ak(n + 1) = a!k{n + 1), while, with probability 



(1— p) it continues the noisy search. Note that function F must 
be such that 

G<F(u)<^, (13) 
we opt for a linear formulation: F{u) 



0.2^^0.2 



K 



K ■ 



A. Algorithm properties 

Hereunder, we restate Theorem 1 in ifTOl and Theorem 1 in 
1 12 1 using our notation. 

Theorem 1: Let Q have at least one pure Nash equilibrium and 
let e be small enough. Then, a pure Nash equilibrium is played at 
least (1 — 5) of the time. 

This theorem introduces a different notion of convergence. 
Generally f4], we say that an algorithm converges when it 
approaches a certain solution as n — >^ oo while, here, it means 
that this solution is played with a high probability an high 
proportion of the total time. 

Theorem 2: Let Q have at least one pure Nash equilibrium and 
let each player employ TE, then a Nash equilibrium that maximizes 
the sum utility among all equilibrium states is played a large 
proportion of the time. 

Note that, generally, different equilibria are associated with 
different social welfare values. Learning algorithms available 
in the literature lH, ifTJl . do not always take into consideration 
the problem of equilibrium selection, which is a central issue 
when aiming at global performance. 

V. Main results 

A. Working point properties 

In this section, we present our results based on the previous 
analysis. Proofs are omitted due to space constraints. Based on 



the game theoretical formulation in Sec. Ill and the algorithm 



properties in Sec. IV we state the following: 

Theorem 3: Let N ^ % be the set ofNE of Q, let P > K and let 
us denote by Ki the number of players satisfied at the l-th NE. Then, 
TE converges to the NE where Ki is maximized. 
This theorem states that, if l3 > K, then TE converges to a 
state where the largest possible number of players are satisfied 
and are at the NE. Here, /3 represents the interest a network 
designer has in satisfying the largest set of players over the 
minimization of the network power consumption. The next two 
theorems allow us to link this result with the original global 
design problem expressed in 

Theorem 4: Let (i) ^ ^ be the set of solutions of ([2]( with 
K.* = K., (li) N ^libe the set ofNE of g and N n ^ and let 
j3 > K. Then, TE learning converges to an action profile a* such 
that a* £ M r\ A^ and is an ESE. 

This theorem links together the concept of ESE of game G', 
the NE of game Q and the solutions of ([2}. Indeed, when the 
assumptions are met, the TE algorithm will reach a network 
state where: (i) all players are satisfied, (ii) the network power 
consumption is minimized. Note that, generally, it is possible 
for ^ to have a solution that is not a NE of Q. 

Theorem 5: Let ([2j have no solution for JC* — K. and fix j3 > K. 
Let K* be the largest number of players that can be simultaneously 
satisfied and let ICm be the m-th set, such that \K.m\ ~ K* , where 
(j2]l has a solution; let also be A*n the corresponding set of solutions. 




Fig. 2. Markov chain describing the TE algorithm in the network. The state 
Eq represents a state where all players are in equilibrium (i.e., SE or NE). 
CK—k represents a state where K — k players are using a correct action (i.e., 
an action that is satisfying or is optimal w.r.t. the others). D represents a state 
where one player is discontent. 



Let us define A* = IJm '^"'^ M % the set of NE. Then, TE 
learning converges to an action profile a* such that a* G A/" n .4*. 
The previous theorem states that, when some players cannot 
satisfy their SINR condition, the TE algorithm selects the sub- 
set AC*,, among all possible IC* such that: (i) the highest number 
of players are satisfied, (ii) the network power consumption is 
minimized (with the unsatisfied players employing power). 

Corollary 1: Let Vfc and V6 be gj^ -p- , let C > K and fix 

P > K. Then, TE converges to a solution of 

Basically, this corollary means that, if transmitters and re- 
ceivers are satisfiable on each channel (high SNR regime), 
then TE converges to an optimal working point. 



B. Convergence analysis 

TE algorithm defines a discrete time Markov chain (DTMC) 
on the set of the states. Studying the behaviour of the algorithm 
on the complete chain is an intractable problem due to the 
number of states, transitions and parameters. In the following, 
we provide an approximated DTMC that allows us to estimate: 
(a) the expected converging time at the NE and at the SE, (b) 
the expected fraction of time the system is at the NE and at 



the SE. Under the light of the description made in Sec. IV 
we state the following: (i) the fraction of time spent in the 
watchful or hopeful states is negligible compared to the one in 
discontent or content one; (ii) at any time, the probability of 
having more than one player discontent is negligible. In the 
following, we assume C > K and a simplified channel model, 
defined as 



(c) 
9k,k 

9),k 



= 1 VA;,Vc 



(14) 



In Sec. VI we will show that these results are good approx- 



imations also under less restrictive conditions. The resulting 
DTMC for studying TE behaviour is represented in Fig. [2] 
When interested in convergence time and occupancy frequency 
of the NE, state Eq represents the NE, and Cx-k a state where 
K — k players are using an individually optimal action and D a 
state where one player is discontent. The transition probabilities 
we evaluate are listed hereunder, the detailed description is 
omitted due to space constraints. 



(15) 
(16) 



(C-K+l)) 
CQ 

(C-K+k) (K-iy. 



P{NE,D) = 
P{D,NE) -- 

P{D,CK-k)-- pfc (K-k)\ 

P{CK-k,CK-k-l)= (it-fc)%^.l+G{A«). (18) 

The analysis of this DTMC allows us to state the following 
theorems: 

Theorem 6: The expected number of iterations needed before 
reaching the NEfor the first time T]\fE is bounded as follows: 



Tne < 



Tne > 



CQ 



,(l+G(An)) (C- A') 

CQ 

,(1+G(A«)) (C-K) 



loe 



7 + log 



K (C-K + l) 

C + 1 
K{C~ K) 

C 



where, 7 ~ 0.577 is the Euler-Mascheroni constant. 
Note that, the time demanded to converge is directly propor- 
tional to the degree of freedom (i.e., I^^j — CQ) and inversely 
to the experimentation probability e. Nonetheless, as we shall 
see, choosing a large e increases the instability of the NE and, 
consequentially, the network performance. 

Theorem 7: The expected fraction of time the system is at a NE 
(1-5) is: 

1 



(1 - 5) = 



1 + P{NE, D)Tbne ' 



(19) 



where 



K 



P{D,NE) 



fc=i 



CQ 



7 + log 



el+G(Au) (c _ A-) V ■ -""V C + 1 
K 



TcNE{k) = 
P{D,D) = 1- P{D,NE)-^P{D,Ck 



K{C-k+l) 



k=l 



Here, (1 — S) depends on -ij as in (15\ . This means that, the 
larger the e the shorter the time the system is at a NE. To 
evaluate convergence time and occupancy frequency of the SE 
we, again, make use of Fig. [2] In this case, state Eq represents 
the SE, Cx-k is a state where K — k players are satisfied 
and D a state where one user is discontent. The corresponding 
transition probabilities are listed hereunder 



K{K-lfe' 
C2 



jC-K+l) 



P{SE,D) = 

P{D,SE)= ^ 

P{D,CK-k)= ^\^^AC-K + k) 

P{CK-k.CK-k-i)= {K-k)Qs ^^ % 



(20) 

(21) 
(22) 

(23) 



Given the model in (14\ and C > K, the term Qs < Q 
represents the number of power quantization levels that a 
player can employ to successfully achieve SINR^ > r on 
any free channel. We can, then, state the following theorems: 



Theorem 8: The expected number of iterations needed before 
reaching the SEfor the first time Tse bounded as follows: 



fsE < 



fsE > 



CQ/Qs 



,(l + G(Au) (c< _ K) 

CQ/Qs 



1 + log 
7 + log 



K{C-K^l] 
C + 1 

C 



e(i+G(a«) {C-K) 

Under assumption i[T4j, being satisfied is a weaker condition 
than being at the NE, thus, it results that Tj^e > TgE- 
Predictably, larger Pmax and lower r increasing Qg, are able 
to improve the converging speed. 

Theorem 9: The expected fraction of time the system is at a SE 
FsE is: 



FsE 



1 



1 + P{SE,D)Tbse 



(24) 



where 



K 



fc=l 



TcSEik) 
P{D,D) 



CQ 



K{C -k + l] 



e{C - K)Qs \ ' V C'+l 

K 

1- P{D,SE)-Y,nD,CK- 



k=l 



VI. Simulation results 



The purpose of this section is threefold. First, we run 
simulations to numerically validate the DTMCs introduced 
in Sec. |V] second, we validate the results on more general 
channel models, then, we evaluate the performance of the 
algorithm in terms of satisfaction and power employed. The 
first two experiments have been run for two different sets of 
parameters. The first set is composed by: A' = 3, C = 4, 
e — 0.02 and 6 < Q < 10. The second set is composed by 
if = 4, C = 5, e = 0.02 and 6 < Q < 10. In our first 
experiment, we run 10^ iterations to estimate (1 — 5) under 
two different channel models: the simple channels expressed 
in (Til l and a Rayleigh channel. The results are summarized 
in Figure [3] As we can see, the analysis, brought on par- 
ticular channel model, proves to be sufficiently precise also 
under more general formulations. In our second experiment, 
we estimated the converging time and compared with the 
analytical results in Figure [4] As we can see, increasing the 
action set dimension, i.e., increasing C or Q, brings slower 
convergence rate since the algorithm requires more time to 
explore all the possibilities. Note that, here, convergence time 
means the time needed by the system to work at the NE 
for the first time. The third experiment's parameter set is 
composed as follows: A = 4, C = 5, e = 0.02 (5 = 8 with the 
simplified channel model as in ( [Til l. Here, we have run 10^ 
tests, each one composed by 6000 iteration of TE. The results 
are showed in Fig |5] where the upper curve represents the 
fraction of players satisfied, while the lower curve represents 
the ratio between the average power employed by the network 
and the optimal power that should be employed to satisfy 
all the players. In average, in accordance with Figure [3] the 
system reaches an optimal equilibrium (all players satisfied 



■ Jt. 
♦ 



♦ 



■I 




► Simulation, simplilied channel K=3,C=4 
■ - -Theoretical cun/e K=3, C=4 

♦ Simulation, Rayleigh channel K=3, C=4 

• Simulation, simplified channel K=4,C=5 
^—Theoretical cun/e K=4, C=5 

■ Simulation, Rayleigh channel K=4, C=5 

6.5 7 7.5 8 f 

Quantization step Q 



Fig. 3. Fraction of time the system is at the NE. Comparison between 
theoretical line and simulation results for two set of data and different 
channels: Rayleigh and simple one as in 



— Upper bound K=4, C=5 
^ Simulation results K=4 C=5 
■ I ■ I Lower bound K=4, C=5 




■ Upper bound K=3, C=4 
Simulation results K=3 C=4 
Lower bound K=3, C=4 



Quantization step Q 



Quantization step Q 



Fig. 4. Expected time for the system to reach the Nash equilibrium. On the 
left, results for a system with K = A players and C = 5 channel. On the 
right, results for a system with K = 3 players and C = 4 channels. 



and minimum amount of power employed) after around 2200 
iterations. Note that, even though, for some specific scenarios, 
this number may be too high, the configurations selected by the 
algorithm before the convergence are just slightly inefficient. 
Indeed, a configuration where all the players are able to satisfy 
their SINR constraints is averagely reached after 600 iterations. 
Moreover, before this, we observe that only a fraction of 
satisfiable players is satisfied, in spite of the amount of power 
used. 

VII. CONCLUSION 

In this paper, we have studied a power control problem 
in a self configuring decentralized network. We presented a 
new decentralized algorithm able to steer the network into 
a working point where the maximum number of transmitter- 
receiver pairs achieves a sufficient SINR while minimizing the 
network power consumption. The algorithm does not assume 
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Fig. 5. Average fraction of players satisfied versus iterations, and ratio 
between the average power employed and the optimal level of power that 
should be employed to satisfy the whole network. Available number of channel 
C = 5, number of players K = A, quantization level Q = 8. 



any prior knowledge of the network and can learn an efficient 
equilibrium with only one bit of feedback. By assuming a 
particular channel realization, we have analytically estimated 
the expected performance of the algorithm through a Markov 
chain description of the algorithm behaviour Finally we have 
shown through Monte-Carlo simulations that the analysis is 
approximatively correct also for general channel models. 

VIII. Acknowledgement 

This research work was carried out in the framework of the 
CORASMA EDA Project B-0781-IAP4-GC. 



[1] 



[2] 



[3] 



[4] 



[5] 



[6] 



[7] 



[8] 



[9] 



[10] 
[11] 



[12] 



[13] 



H. liduka, "Fixed point optimization algorithm and its application to 
power control in CDMA data networks," Mathematical Programming, 
Oct. 2010. 

E. Altman and Z. Altman, "S-modular games and power control in 
wireless networkse," IEEE Trans. Automat. Contr.s, vol. 48, no. 5, pp. 
839-842, May 2003. 

J.-S. Pang, G. Scutari, F. Facchinei, and C. Wang, "Distributed power 
allocation with rate constraints in Gaussian parallel interference chan- 
nels," IEEE Trans, on Info. Theory, vol. 54, no. 8, pp. 3471-3489, Aug. 
2008. 

L. Rose, S. Lasaulce, S. M. Perlaza, and M. Debbah, "Learning 
equilibria with partial information in decentralized wireless networks," 
IEEE Communications Magazine, vol. 49, no. 8, pp. 136 -142, Aug. 
2011. 

O. Popescu and C. Rose, "Water filling may not good neighbors make," 
in In Proceedings 2003 IEEE Global Telecommunications Conference - 
GWBECOM 03, 2003, pp. 1766-1770. 

L. Rose, S. M. Perlaza, and M. Debbah, "On the Nash equilibria 
in decentralized parallel interference channels," in IEEE Workshop on 
Game Theory and Resource Allocation for 4G, Kyoto, Japan, Jun. 201 1. 
E. Altman, V. Kamble, and H. Kameda, "A Braess type paradox in power 
control over interference channels," 1st International ICST Workshop on 
Physics Inspired Paradigms for Wireless Communications and Network, 
May 2008. 

S. M. Perlaza, M. Debbah, S. Lasaulce, and H. Bogucka, "On the 

benefits of bandwidth limiting in decentralized vector multiple access 

channels," in Proc. 4th Intl. Conf. on Cognitive Radio Oriented Wireless 

Networks and Comm. (CROWNCOM), May 2009. 

D. Braess, A. Nagumey, and T. Wakolbinger, "On a paradox of traffic 

planning," Transportation Science, vol. 39, pp. 446^50, November 

2005. 

H. P. Young, "Learning by trial and eiTor," Tech. Rep., 2008. 
S. M. Perlaza, H. Tembine, S. Lasaulce. and M. Debbah, "Quality-of- 
service provisioning in decentralized networks: A satisfaction equilib- 
rium approach," IEEE Journal of Selected Topics in Signal Processing, 
Feb. 2012. 

B. S. Pradelski and H. P. Young, "Efficiency and equilibrium in trial 
and error learning," Tech. Rep., 2010. 

G. Scutari, D. Palomar, and S. Barbarossa, "Optimal linear precoding 
strategies for wideband non-cooperative systems based on game theory 
- part II: Algorithms," IEEE Trans, on Signal Processing, vol. 56, no. 3, 
pp. 1250-1267, mar. 2008. 



